Bobcat Migration¶
Species Description¶
Bobcats are part of the larger genus of Lynx. There are 4 extant species of Lynx, one of which is the bobcat. The bobcat is often confused with the Lynx, but the key difference is that Lynx are larger and their tails are all black versus the Bobcat which have white tails with a black strip.
I chose the bobcat 'lynx rufus' to look at their migration because I have a sister who works at the Sonoma County Wildlife Rescue in California. She does the rescue and release part of the porgram and they frequently rescue injured bobcats, or get in baby bobcats that were abandoned in some way that they care for until they are old enough to be released. I however, am in Colorado and I have seen bobcats in the wilderness. This made me curious about their overall migration.
This species's migration ranges between Mexico and Canada, with the majority of the bobcats being found in the U.S. They are not a threatened species, but do suffer from habitat loss. They "live in a wide variety of habitats, including boreal coniferous and mixed forests in the north, bottomland hardwood forests and coastal swamps in the southeast, and desert and scrublands in the southwest"(The Smithsonian National Zoo & Conservation Biology Institute). Another important note that will inform the later interpretation is that bobcats do not hibernate and are most active during the winter, particularly January and February which is their mating season (The Wildlife Rescue Legaue).
If you are insterested in learning more about bobcats please visit The Smithsonian's National Zoo & Conservation Biology Institute and The Wildlife Rescue League .
Data Description¶
There were two sources of data used for this project - Ecoregions and GBIF data. Ecoregions data is available via a shapefile which can be downloaded here . The GBIF data is for species occurances. Together, the two can convey the migration spatially and over time of a certain species (if that species' data is available from GBIF).
Ecoregion Data¶
Ecoregions "are areas that are geographically and ecologically similar and experience similar levels of solar radiation and precipitation". There are 846 Ecoregions globally, but most species would be in a small portion of those edoregions depending on their habitats and migration patterns. The Ecoregion shapefile used is an updated version from 2017 that was developed by experts.
Ecoregion Citation:
Terauds, Aleks, et al. “Announcing the Release of Ecoregion Snapshots.” >One Earth, One Earth, 31 May 2024, >www.oneearth.org/announcing-the-release-of-ecoregion-snapshots/.
GBIF Occurences Data¶
GBIF stands for Global Biodiversity Information Facility, it "is an international network and data infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth"(Data Info citation below). This network draws many data sources together including museum specimans, DNA barcodes, and crowdsourced photos from smartphones used by non-experts and experts. GBIF uses data standards to index all these species records.
The data can vary:
There may be more occurances recorded in National Parks compared to the Arctic, even if the species may similarly present in different regions. This is because there are fewer people to observe in the arctic.
There may be greater or fewer occurances depending on time of year that people go outside, how accesible a region is during different times of year, etc.
There may be variation depending on how many people want to provide/upload data and that also depends on what that person knows or likes more - one may be more likely to upload species they like or know.
Because this data laregly comes from crowdsourcing and the data will need to be normalized which will be further explained in the Methods Description.
GBIF Citation:
Data Info:
Data:
GBIF.org (12 October 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.sye4x3
Methods Description¶
Because the GBIF data can vary by space and time, the data needs to be normalized to account for these issues that would otherwise skew the plot (ex. make it look like there are more observations in a certain area or time of year, when that wouldn't be an accurate reflection of reality). The data was normalized by ecoregion samples, by month samples, and by area. This way the normalization is over space and time (This should help control for the number of active observers in each location and time of year.)
Before normalization can happen, the GBIF data was converted into a GeoDataFrame using latitude and longitude geometry.
Next, a spatial join was performed with parameters (how= 'inner', predicate= 'contains'). This identifies the Ecoregion for each observation.
Next, observations were grouped by Ecoregion, selecting only month/ecosystem combinations that have more than one occurance recorded (since a single occurence could be an error). The .groupby() and .mean() methods were used to compute the mean occurences by ecoregion and by month.
Lastly, divide occurrences by the mean occurrences by month AND the mean occurrences by ecoregion.
This normalizes the data to be able to have a more accurate plot.
Access locations and times of Bobcat encounters¶
I used a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.
Set up code to prepare for download¶
I will be getting data from a source called GBIF (Global Biodiversity
Information Facility). I need a package called
pygbif to access the data, which may not be included in my
environment. Install it by running the cell below:
%%bash
pip install pygbif
Requirement already satisfied: pygbif in /opt/conda/lib/python3.11/site-packages (0.6.4) Requirement already satisfied: requests>2.7 in /opt/conda/lib/python3.11/site-packages (from pygbif) (2.32.3) Requirement already satisfied: requests-cache in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.2.1) Requirement already satisfied: geojson-rewind in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: requests>2.7 in /opt/conda/lib/python3.11/site-packages (from pygbif) (2.32.3) Requirement already satisfied: requests-cache in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.2.1) Requirement already satisfied: geojson-rewind in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: geomet in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: appdirs>=1.4.3 in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.4.4) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from pygbif) (3.9.2) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2.2.3) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2024.8.30) Requirement already satisfied: click in /opt/conda/lib/python3.11/site-packages (from geomet->pygbif) (8.1.7) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.3.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (4.54.1) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.4.7) Requirement already satisfied: numpy>=1.23 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (2.0.2) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (24.1) Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (10.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (3.1.4) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (2.9.0) Requirement already satisfied: attrs>=21.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (24.2.0) Requirement already satisfied: cattrs>=22.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (24.1.2) Requirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (4.3.6) Requirement already satisfied: url-normalize>=1.4 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (1.4.3) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->pygbif) (1.16.0)
import os
import pathlib
import time
import zipfile
from getpass import getpass
from glob import glob
import pandas as pd
import pygbif.occurrences as occ
import pygbif.species as species
# Create data directory in the home folder
data_dir_bobcat = os.path.join(
# Home directory
pathlib.Path.home(),
# Earth analytics data directory
'earth-analytics',
'data',
# Project directory
'species_distribution_bobcat',
)
os.makedirs(data_dir_bobcat, exist_ok=True)
# Define the directory name for GBIF data
gbif_dir_bobcat = os.path.join(data_dir_bobcat, 'gbif_bobcat')
Check the location for the data_dir_bobcat¶
data_dir_bobcat
'/home/jovyan/earth-analytics/data/species_distribution_bobcat'
Register and log in to GBIF¶
I need a GBIF account to complete this challenge. Then, run the following code to save my credentials on your computer.
Warning
My email address must match the email I used to sign up for GBIF!
Tip
If I accidentally enter your credentials wrong, I can set
reset_credentials=Trueinstead ofreset_credentials=False.
reset_credentials = False
# GBIF needs a username, password, and email
credentials = dict(
GBIF_USER=(input, 'GBIF username:'),
GBIF_PWD=(getpass, 'GBIF password'),
GBIF_EMAIL=(input, 'GBIF email')
)
for env_variable, (prompt_func, prompt_text) in credentials.items():
# Delete credential from environment if requested
if reset_credentials and (env_variable in os.environ):
os.environ.pop(env_variable)
# Ask for credential and save to environment
if not env_variable in os.environ:
os.environ[env_variable] = prompt_func(prompt_text)
Check and make sure you username is correct¶
Also double check that the password has been saved
os.environ ['GBIF_USER']
'brglea'
!echo $GBIF_USER
brglea
'GBIF_PWD' in os.environ
True
Get the species key¶
- Replace the
species_namewith the name of the species you want to look up - Run the code to get the species key
# Query species
species_info = species.name_lookup('lynx rufus', rank='SPECIES')
# Get the first result
first_result = species_info['results'][0]
# Get the species key (nubKey)
species_key = first_result['nubKey']
# Check the result
first_result['species'], species_key
('Lynx rufus', 2435246)
Download data from GBIF¶
Submit a request to GBIF
Replace
csv_file_patternwith a string that will match any.csvfile when used in theglobfunction. HINT: the character*represents any number of any values except the file separator (e.g./)Add parameters to the GBIF download function,
occ.download()to limit your query to:- observations
- from 2023
- with spatial coordinates.
Then, run the download. This can take a few minutes. :::
# Only download once
gbif_pattern = os.path.join(gbif_dir_bobcat, '*.csv')
if not glob(gbif_pattern):
# Only submit one request
if not 'GBIF_DOWNLOAD_KEY' in os.environ:
# Submit query to GBIF
gbif_query = occ.download([
"speciesKey = 2435246",
"hasCoordinate = True",
"year = 2023",
])
os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]
# Wait for the download to build
download_key = os.environ['GBIF_DOWNLOAD_KEY']
wait = occ.download_meta(download_key)['status']
while not wait=='SUCCEEDED':
wait = occ.download_meta(download_key)['status']
time.sleep(5)
# Download GBIF data
download_info = occ.download_get(
os.environ['GBIF_DOWNLOAD_KEY'],
path=data_dir_bobcat)
# Unzip GBIF data
with zipfile.ZipFile(download_info['path']) as download_zip:
download_zip.extractall(path=gbif_dir_bobcat)
# Find the extracted .csv file path (take the first result)
gbif_path = glob(gbif_pattern)[0]
Check the gbif_path¶
gbif_path
'/home/jovyan/earth-analytics/data/species_distribution_bobcat/gbif_bobcat/0010842-241007104925546.csv'
Load the GBIF data into Python¶
Load GBIF data
- Look at the beginning of the file I downloaded using the code below. The delimiter is a tab.
- Run the following code cell.
- Uncomment and modify the parameters of
pd.read_csv()below until my data loads successfully and I have only the columns I want.
I can use the following code to look at the beginning of my file:
!head -n 2 $gbif_path
gbifID datasetKey occurrenceID kingdom phylum class order family genus species infraspecificEpithet taxonRank scientificName verbatimScientificName verbatimScientificNameAuthorship countryCode locality stateProvince occurrenceStatus individualCount publishingOrgKey decimalLatitude decimalLongitude coordinateUncertaintyInMeters coordinatePrecision elevation elevationAccuracy depth depthAccuracy eventDate day month year taxonKey speciesKey basisOfRecord institutionCode collectionCode catalogNumber recordNumber identifiedBy dateIdentified license rightsHolder recordedBy typeStatus establishmentMeans lastInterpreted mediaType issue 4953158569 50c9509d-22c7-4a22-a47d-8c48425ef4a7 https://www.inaturalist.org/observations/151699352 Animalia Chordata Mammalia Carnivora Felidae Lynx Lynx rufus SPECIES Lynx rufus (Schreber, 1777) Lynx rufus US California PRESENT 28eb1a3f-1c15-4a95-931a-4af90ecb574d 34.205588 -118.36292 28846.0 2023-03-16T03:21 16 3 2023 2435246 2435246 HUMAN_OBSERVATION iNaturalist Observations 151699352 Devon 2023-03-20T04:33:39 CC_BY_NC_4_0 Devon Devon 2024-10-12T11:10:13.765Z StillImage COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
# Load the GBIF data
bobcat_gbif_df = pd.read_csv(
gbif_path,
delimiter='\t',
index_col='gbifID',
usecols=['gbifID', 'month', 'decimalLatitude', 'decimalLongitude']
)
bobcat_gbif_df.head()
| decimalLatitude | decimalLongitude | month | |
|---|---|---|---|
| gbifID | |||
| 4953158569 | 34.205588 | -118.362920 | 3 |
| 4953055247 | 36.537634 | -121.890603 | 10 |
| 4953008628 | 44.122360 | -119.848505 | 11 |
| 4952902566 | 34.270494 | -118.320036 | 9 |
| 4952869276 | 41.546645 | -72.608720 | 1 |
Import geopandas to use in next cell¶
import geopandas as gpd
Download and save ecoregion boundaries¶
The ecoregion boundaries take some time to download – they come in at about 150MB. To use my time most efficiently, I need to complete caching the ecoregions data on the machine I'm working on so that I only have to download once. To do that, I'll also use conditionals, or code that adjusts what it does based on the situation.
Get ecoregions boundaries href="https://www.geographyrealm.com/terrestrial-ecoregions-gis-data/">get
- Find the URL for for the ecoregion boundary Shapefile. I can <a
ecoregion boundaries from Google.. 2. Change all the variable names to descriptive variable names, making sure to correctly reference variables I created before. 3. Run the cell to download and save the data.
# Set up the ecoregion boundary URL
ecoregions_url = (
"https://storage.googleapis.com/teow2016"
"/Ecoregions2017.zip")
# Set up a path to save the data on your machine
ecoregions_dir = os.path.join(data_dir_bobcat, 'resolve_ecoregions')
# Make the ecoregions directory
os.makedirs(ecoregions_dir, exist_ok=True)
# Join ecoregions shapefile path
ecoregions_path = os.path.join(ecoregions_dir, 'ecoregions.shp')
# Only download once
if not os.path.exists(ecoregions_path):
ecoregions_gdf = gpd.read_file(ecoregions_url)
ecoregions_gdf.to_file(ecoregions_path)
Make sure that worked! Use a bash command
called find to look for all the files in my project directory with
the .shp extension:
%%bash
find ~/earth-analytics/data/species_distribution_bobcat -name '*.shp'
/home/jovyan/earth-analytics/data/species_distribution_bobcat/resolve_ecoregions/ecoregions.shp
Load Ecoregions into Python¶
Download and save ecoregion boundaries from the EPA:
- Make a quick plot with
.plot()to make sure the download worked. - Run the cell to load the data into Python
# Open up the ecoregions boundaries
ecoregions_gdf = gpd.read_file(ecoregions_path)
# Name the index so it will match the other data later on
ecoregions_gdf.index.name = 'ecoregion'
# Plot the ecoregions to check download
ecoregions_gdf.plot(edgecolor='black', color='lightgreen')
<Axes: >
ecoregions_gdf.head()
| OBJECTID | ECO_NAME | BIOME_NUM | BIOME_NAME | REALM | ECO_BIOME_ | NNH | ECO_ID | SHAPE_LENG | SHAPE_AREA | NNH_NAME | COLOR | COLOR_BIO | COLOR_NNH | LICENSE | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ecoregion | ||||||||||||||||
| 0 | 1.0 | Adelie Land tundra | 11.0 | Tundra | Antarctica | AN11 | 1 | 117 | 9.749780 | 0.038948 | Half Protected | #63CFAB | #9ED7C2 | #257339 | CC-BY 4.0 | MULTIPOLYGON (((158.7141 -69.60657, 158.71264 ... |
| 1 | 2.0 | Admiralty Islands lowland rain forests | 1.0 | Tropical & Subtropical Moist Broadleaf Forests | Australasia | AU01 | 2 | 135 | 4.800349 | 0.170599 | Nature Could Reach Half Protected | #70A800 | #38A700 | #7BC141 | CC-BY 4.0 | MULTIPOLYGON (((147.28819 -2.57589, 147.2715 -... |
| 2 | 3.0 | Aegean and Western Turkey sclerophyllous and m... | 12.0 | Mediterranean Forests, Woodlands & Scrub | Palearctic | PA12 | 4 | 785 | 162.523044 | 13.844952 | Nature Imperiled | #FF7F7C | #FE0000 | #EE1E23 | CC-BY 4.0 | MULTIPOLYGON (((26.88659 35.32161, 26.88297 35... |
| 3 | 4.0 | Afghan Mountains semi-desert | 13.0 | Deserts & Xeric Shrublands | Palearctic | PA13 | 4 | 807 | 15.084037 | 1.355536 | Nature Imperiled | #FA774D | #CC6767 | #EE1E23 | CC-BY 4.0 | MULTIPOLYGON (((65.48655 34.71401, 65.52872 34... |
| 4 | 5.0 | Ahklun and Kilbuck Upland Tundra | 11.0 | Tundra | Nearctic | NE11 | 1 | 404 | 22.590087 | 8.196573 | Half Protected | #4C82B6 | #9ED7C2 | #257339 | CC-BY 4.0 | MULTIPOLYGON (((-160.26404 58.64097, -160.2673... |
Convert the GBIF data to a GeoDataFrame¶
To plot the GBIF data, I need to convert it to a GeoDataFrame first.
This will make some special geospatial operations from geopandas
available, such as spatial joins and plotting.
- Convert
DataFrametoGeoDataFrame - Run the code to get a
GeoDataFrameof the GBIF data.
#Convert dataframe (df) into geo dataframe (gdf)
bobcat_gbif_gdf = (
gpd.GeoDataFrame(
bobcat_gbif_df,
geometry=gpd.points_from_xy(
bobcat_gbif_df.decimalLongitude,
bobcat_gbif_df.decimalLatitude),
crs="EPSG:4326")
# Select the desired columns
[['month', 'geometry']]
)
bobcat_gbif_gdf
| month | geometry | |
|---|---|---|
| gbifID | ||
| 4953158569 | 3 | POINT (-118.36292 34.20559) |
| 4953055247 | 10 | POINT (-121.8906 36.53763) |
| 4953008628 | 11 | POINT (-119.8485 44.12236) |
| 4952902566 | 9 | POINT (-118.32004 34.27049) |
| 4952869276 | 1 | POINT (-72.60872 41.54664) |
| ... | ... | ... |
| 4011868162 | 1 | POINT (-121.81632 37.43427) |
| 4011836346 | 1 | POINT (-122.44225 42.11348) |
| 4011733344 | 1 | POINT (-97.10298 32.58653) |
| 4011611239 | 1 | POINT (-121.7627 36.66162) |
| 4011547228 | 1 | POINT (-96.57154 32.53426) |
3657 rows × 2 columns
Store the new version of your dataframe for other notebooks as needed¶
%store ecoregions_gdf bobcat_gbif_gdf
Stored 'ecoregions_gdf' (GeoDataFrame) Stored 'bobcat_gbif_gdf' (GeoDataFrame)
Normalize Data¶
Identify the Ecoregion for Each Observation¶
Combine the ecoregions and the observations spatially using
a method called .sjoin(), which stands for spatial join.
- Perform a spatial join
- Identify the correct values for the
how=andpredicate=parameters of the spatial join. - Select only the columns I will need for my plot.
- Run the code.
gbif_ecoregion_gdf = (
ecoregions_gdf
# Match the CRS of the GBIF data and the ecoregions
.to_crs(bobcat_gbif_gdf.crs)
# Find ecoregion for each observation
.sjoin(
bobcat_gbif_gdf,
how='inner',
predicate='contains')
# Select the required columns
[['month','ECO_NAME','gbifID', 'OBJECTID']]
# rename columns as needed
.reset_index()
.rename(columns={
'ECO_NAME': 'name',
'gbifID': 'observation_id',
'OBJECTID': 'object_id'})
)
gbif_ecoregion_gdf
| ecoregion | month | name | observation_id | object_id | |
|---|---|---|---|---|---|
| 0 | 16 | 10 | Allegheny Highlands forests | 4420901180 | 17.0 |
| 1 | 16 | 5 | Allegheny Highlands forests | 4116293969 | 17.0 |
| 2 | 16 | 7 | Allegheny Highlands forests | 4165965742 | 17.0 |
| 3 | 16 | 3 | Allegheny Highlands forests | 4535584198 | 17.0 |
| 4 | 16 | 2 | Allegheny Highlands forests | 4055045625 | 17.0 |
| ... | ... | ... | ... | ... | ... |
| 3592 | 833 | 6 | Northern Rockies conifer forests | 4438948603 | 839.0 |
| 3593 | 833 | 11 | Northern Rockies conifer forests | 4458403810 | 839.0 |
| 3594 | 833 | 2 | Northern Rockies conifer forests | 4067548544 | 839.0 |
| 3595 | 833 | 11 | Northern Rockies conifer forests | 4454006926 | 839.0 |
| 3596 | 833 | 9 | Northern Rockies conifer forests | 4414403437 | 839.0 |
3597 rows × 5 columns
Count the Observations in Each Ecroregion Each Month¶
Group observations by ecoregion
- Select only month/ecosystem combinations that have more than one occurrence recorded, since a single occurrence could be an error.
- Use the
.groupby()and.mean()methods to compute the mean occurrences by ecoregion and by month. - Run the code – it will normalize the number of occurrences by month and ecoregion
bobcat_occurrence_df = (
gbif_ecoregion_gdf
# For each ecoregion, for each month...
.groupby(['ecoregion', 'month'])
# ...count the number of occurrences
.agg(occurrences=('observation_id', 'count'))
)
# Get rid of rare observations (possible misidentification?)
bobcat_occurrence_df = bobcat_occurrence_df[bobcat_occurrence_df.occurrences>1]
bobcat_occurrence_df
# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
bobcat_occurrence_df
.groupby(['ecoregion'])
.mean()
)
# Take the mean by month
mean_occurrences_by_month = (
bobcat_occurrence_df
.groupby(['month'])
.mean()
)
bobcat_occurrence_df
| occurrences | ||
|---|---|---|
| ecoregion | month | |
| 16 | 5 | 3 |
| 9 | 2 | |
| 10 | 2 | |
| 32 | 1 | 3 |
| 2 | 8 | |
| ... | ... | ... |
| 832 | 3 | 2 |
| 9 | 2 | |
| 833 | 2 | 2 |
| 10 | 2 | |
| 11 | 2 |
407 rows × 1 columns
mean_occurrences_by_ecoregion
| occurrences | |
|---|---|
| ecoregion | |
| 16 | 2.333333 |
| 32 | 6.000000 |
| 33 | 3.300000 |
| 34 | 2.750000 |
| 43 | 4.200000 |
| ... | ... |
| 783 | 8.416667 |
| 790 | 19.666667 |
| 793 | 3.000000 |
| 832 | 2.000000 |
| 833 | 2.000000 |
66 rows × 1 columns
mean_occurrences_by_month
| occurrences | |
|---|---|
| month | |
| 1 | 10.129032 |
| 2 | 9.062500 |
| 3 | 9.303030 |
| 4 | 8.382353 |
| 5 | 8.540541 |
| 6 | 8.800000 |
| 7 | 7.517241 |
| 8 | 7.000000 |
| 9 | 7.800000 |
| 10 | 7.550000 |
| 11 | 6.952381 |
| 12 | 10.483871 |
Normalize the Observations¶
Divide occurrences by the mean occurrences by month AND the mean occurrences by ecoregion
# Normalize by space and time for sampling effort
bobcat_occurrence_df['norm_occurrences'] = (
bobcat_occurrence_df
/mean_occurrences_by_month
/mean_occurrences_by_ecoregion
)
bobcat_occurrence_df
| occurrences | norm_occurrences | ||
|---|---|---|---|
| ecoregion | month | ||
| 16 | 5 | 3 | 0.150542 |
| 9 | 2 | 0.109890 | |
| 10 | 2 | 0.113529 | |
| 32 | 1 | 3 | 0.049363 |
| 2 | 8 | 0.147126 | |
| ... | ... | ... | ... |
| 832 | 3 | 2 | 0.107492 |
| 9 | 2 | 0.128205 | |
| 833 | 2 | 2 | 0.110345 |
| 10 | 2 | 0.132450 | |
| 11 | 2 | 0.143836 |
407 rows × 2 columns
Store the new version of your dataframe for other notebooks as needed¶
%store bobcat_occurrence_df
Stored 'bobcat_occurrence_df' (DataFrame)
Plot Data¶
Import Packages Needed¶
Import packages for making interactive maps with vector data as well as calendar in order to get month names that will be used when creating the]plot.
# Get month names
import calendar
# Libraries for Dynamic mapping
import cartopy
import cartopy.feature as cf
import cartopy.crs as ccrs
import geopandas as gpd
import geoviews as gv
import geoviews.feature as gf
import holoviews as hv
import hvplot.pandas
import panel as pn
/opt/conda/lib/python3.11/site-packages/dask/dataframe/__init__.py:42: FutureWarning: Dask dataframe query planning is disabled because dask-expr is not installed. You can install it with `pip install dask[dataframe]` or `conda install dask`. This will raise in a future version. warnings.warn(msg, FutureWarning)
Create a Simplified GeoDataFrame for Plotting¶
The code below will
streamline plotting with hvplot by simplifying the geometry,
projecting it to a Mercator projection that is compatible with
geoviews, and cropping off areas in the Arctic.
Download and save ecoregion boundaries from the EPA:
- Simplify the ecoregions with
.simplify(.05), and save it back to thegeometrycolumn. - Change the Coordinate Reference System (CRS) to Mercator with
.to_crs(ccrs.Mercator()) - Use the plotting code that is already in the cell to check that the plotting runs quickly (less than a minute) and looks the way you want.
# Simplify the geometry to speed up processing
ecoregions_gdf.geometry = ecoregions_gdf.simplify(
.01, preserve_topology=False
)
# Change the CRS to Mercator for mapping
ecoregions_gdf = ecoregions_gdf.to_crs(ccrs.Mercator())
# Check that the plot runs in a reasonable amount of time
ecoregions_gdf.hvplot(
x='Longitude',
y='Latitude',
geo=True,
crs=ccrs.Mercator()
)